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Abstract 

The expected number of pairwise comparisons needed to learn a partial order 
on n elements is shown to be at least n 2 /4 — o(n 2 ), and an algorithm is given 
that needs only n 2 /4 + o(n 2 ) comparisons on average. In addition, the optimal 
strategy for learning a poset with four elements is presented. 
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1 Introduction 

Traditional sorting is about learning a linear order. Its complexity is often measured 
by the number of pairwise comparisons a sorting algorithm needs on average, which 
is known to be <d(n log n). It is a straightforward generalization to ask for algorithms 
which learn a partial order by pairwise comparisons, a task that could be termed partial 
sorting. Let us designate the set of all strict partial orders on n = {0, 1, . . .,n — 
1} by V(n). This set has 2" / 4 +°(" ) many elements (cf. [||]), and each pairwise 
comparison of elements of n has at most three possible results. A trivial lower bound 
for the expected number of comparisons needed to learn some P 6 V is therefore 
log 3 \0(n) | = 41 " g + o(n 2 ), since in a rooted tree with I leaves in which each node 
has at most r children, the average leaf -root-distance is at least log r £, 

In this paper, a lower bound of ^- — o(n ) is proved, which is larger than the above 
by a factor of log 2 3 « 1.58. In other words, any learning algorithm for large posets 
must expect to compare at least about half of all pairs. Moreover, it will be shown 
that there are indeed algorithms whose expected running time is just ^- + o(n 2 ). Both 
results use the fact that for (very) large n, almost all posets have a specific three-leveled 
shape. 

To underline the asymptotic nature of the results presented below, Figure [j] shows as 
a contrast the optimal poset learning strategy for n = 4 which has been determined by 
a recursive computer search. Each node is a possible state (up to (dual) isomorphisms), 
and the node's diagram shows all relations (like in a Hasse diagram) and all incompara- 
bilities (represented by dotted lines) known in that state. The edges show which states 
can arise from which others, where loops indicate dualization. Those states in which 
there is only one possible type of comparison are framed with thinner lines, so the other 
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nodes already determine the actual strategy. Its average running time is 5.461 compar- 
isons, compared to 6 pairs and a trivial lower bound of log 3 |P(4)| = log 3 219 i=s 4.905 
comparisons. The optimal strategy for n — 5 takes 8.744 comparisons on average, 
while (I) = 10 and log 3 |P(5)| = log 3 4231 » 7.601. 



2 The lower bound 

Given P G V(n) and a,b G n, the pairwise comparison {a, 6} determines P|{ a ,b}, that 
is, provides the information whether a P b, b P a, or neither. Let us define the covering 
and anti-covering relations of P by 

P V :=P\P 2 and P^ := {(x, y) G n x n : x ^ y, Pz C Py, and yP C zP} \ P 

where Py = {a: : x Py} and sP = {y : x Py}. We consider algorithms which learn 
a partial order P G P(n) given a number n ^ 1 and an oracle for P, which is just 
a subroutine that performs a pairwise comparison in P. The algorithms can learn P 
only through oracle calls, each of which is assumed to take constant time. For any such 
algorithm tp, let c v (P) be the number of pairwise comparisons the algorithm needs 
until it knows P. Then e v (n) := Y^PeVM c v(^ > )/l^'( n )l * s me ex P ec ted number of 
pairwise comparisons for that algorithm. Finally, let Q(P) := {{a, b} : aP v b or 
aP^b}. 

Lemma 1 c v (P) > \Q(P)\for all ip and P. 

Proof. Assume that ip claims to know P but has not compared the pair {a, b} G Q(P)- 
lfaP v b, put P 1 := P\{(a,b)}, while if a P^ b, put P' :=PU{(a,6)}. Then P' is 
a partial order that would erroneously be recognized as P by tp. □ 

For R G P(4), for example, the average cardinality of Q(R) is about 4.849 which 
is smaller than the trivial lower bound of 4.905. But for R G P(5) it is about 7.958 
which improves the trivial lower bound of 7.601. 

For the rest of this section, assume that n is a multiple of 4. Let L(n) be the set 
of all ordered partitions (A,B,C) of n with \A\ = \C\ = n/A and \B\ = n/2. Put 
T(n) := \J(A,B,c)eL(n) ^abcW), where T AB c{n) is the set of all P G V{n) which 
fulfil (i) x = y or Px % Py or yP % xP for all (x, y) G A 2 U P 2 U C 2 , and (ii) 
ftP n B n Pc ^ and A n Pfo + ^ 6P n C for all (a, 6, c) G Ax B x C. 
In particular, these posets consist of a lower level A of n/A minimal elements, an 
antichain B of size n/2 building the middle level, and an upper level C of n/A maximal 
elements, and no C-element covers an A-element. Moreover, (i) and (ii) imply that 
Q{P) - Qabc ■■= {A x B) U (P x C). 

„ |T(4m)| , , s 

Lemma 2 {— ^ f| = 1 - o(l/m). 

|P(4m)| v 7 ' 

Proof. Let n = Am. Improving upon the original asymptotics of Kleitman and Roth- 
schild [gj], Brightwell, Promel, and Steger [Jl[| showed that for some K > 1, \V(n)\ = 
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|<S(rc)|(l + 0(K ")), where S(n) := \J(A,B,C)£L(n) S ABc(n) and S A Bc(n) is the 
set of all P eV{n) with P y C Q ABC and \Px\ > 1 for all x G 5 U C. On the other 
hand, it is easy to see that T(n) C <S(n) and |T(n)|/|5(n)| = 1 — o(l/n). Hence 
1 ~ lT(n)[/|^(n)l = 1 - = o(l/n). □ 

Because P G T(n) implies \Q(P)\ — n 2 /4, it follows that e v (n) has a lower 
bound of n 2 /4 — o(n 2 ). Table [y compares n 2 /4 with log 3 ('P(n)l for some small 
values of n (based on numbers from [||]). 



3 A simple algorithm 

Consider the algorithm ^3 listed in Figure || which learns a partial order on N. If N 
is a multiple of 4, the strategy of ps first assumes that P is a member of T(N). If 
the assumption is true, p% will determine the corresponding level partition (A, B, C) 
in o(n 2 ) expected time so that it can afterwards compare exactly the n 2 /4 pairs in 
Q(P) = Qabc- ln the asymptotically unlikely case that P £ T (N) it will detect that 
fact and perform a comparison of all pairs. 

Although this is obviously not the best possible strategy, the amount of time <ps 
"wastes" becomes negligible for N — > 00. 

Theorem 1 <^ 3 is an asymptotically optimal poset learning algorithm in the sense that 
e^(N) = N 2 /A + o(N 2 ). 

Proof. Let P G V(N). Because of lines 20-21, ip^ learns P completely. 

Let U(n) := [j {A ^ B fi)eL{n) U AB c{n) 2 T(n), where U A Bc{n) 2 T ABC {n) is 
the set of all P G V(n) with P v C Qabc- We may assume that P\ n G UA B c { n ) 
for some (Aq, Bq : Co) G L(n), since by Lemma ^, P\ n G W(n) is true with a proba- 
bility converging to 1 as N — > 00. Note that a„ := 1 — |T(n)|/|W(n)| = o(l/n) is an 
upper bound for the probability that at some point in ip^, either A % Aq, B % Bq, or 
C£C . 

Conditional to P\ n G Ua b C {n), the event x Py has probability | independently 
for all (x,y) G QaoBoCV Hence one can estimate the expected number of pairwise 
comparisons in iteration k of the main loop as follows. 

(i) Assume that k G Bq. For j := 1 and 2 ^ i ^ r, the disjunction in lines 9-12 is 
violated with probability at most | + a n . Hence iteration k takes an expected number 
of at most 
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pairwise comparisons in this case. 

(ii) Assume that, on the other hand, k G Aq U Cq. For 1 ^ i ^ to, the probability 
that both the conditions of lines 15-16 are violated is at most i + a n so that in this 
case iteration k takes an expected number of at most 

jr i(i + «„r 1 + („ - 1)(| + an )A < aii^iiT 2 + n{ ' 2 + an)m 

_i=l 
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We have seen that, given r and to, iteration k takes in both cases an expected 
number of at most (16+4n(i+a„) a )/3„ pairwise comparisons, where a := min{r, m} 
and (3 n — ► 1. At the beginning of iteration k, for < a ^ |, the probability that r = a 
and to = k — a is at most 

G) („%) L/ 2 )U 

and so is the probability that m = a and r = k - a. In contrast, the probability that 
r + to 7^ fc is at most a n . In all, iteration k takes an expected number of at most 

2 E (^pry (J + «nj (16 + 4n(i + a)»)/?(n) + a„ (?) 

on-fc fe 

< O(n)— -]T (i + a „)n fc -°+o(n) 

U/2J a =0 W 
on 

= 0{n) T —{l + ?f) k + o{n)=0{n^){l + ^) k 

\n/2J 

pairwise comparisons, so that the total expected number of comparisons in lines 1-18 
is 0(n 3 / 2 ). If P £ T(N) then P is uniquely determined in line 20, hence the expected 
number of comparisons in lines 19-21 is N 2 /4 + o(N 2 ), proving the theorem. □ 
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Figure 1 : Optimal learning strategy for 4-element posets 




Table 1 : Comparison of lower bounds for e v (n) for small n 



n 


4 


8 


12 


13 


14 


15 


n 2 /4 


4 


16 


36 


42.25 


49 


56.25 


log 3 |P(n)| 


4.91 


18.10 


36.93 


42.41 


48.19 


54.26 



Figure 2: The asymptotically optimal algorithm (p 3 

input: oracle C for pairwise comparisons in a partial order P on N 
output: P 



1 put A = B = C = 

2 find largest n ^ N with 4|n 

ma;n Zoo/5: 

3 for fc from to n - 1 do 

4 putr = \AUC\ andm = |P| 

5 assume A U C = {xi, • • • , x r }, B — {y\, • • • , y m }, 

andn\{fc} = {xi, . . . ,x n -i} = {yi, . . . , y n -i} 

inner loop: 

6 for i from 1 to n — 1 do 

7 call C(fc, a^) 

8 if, for some j < i, either 

9 XiPkPxj,oi 

10 XjPkPxi, or 

11 P fc, G A, but not Xj P k), or 

12 (kP x^ x 3 e C, but not kPxj) 

13 then add klo B and continue in main loop 

14 callCO,?/,) 

15 if i ^ m and kP yi then add fc to A and continue in main loop 

16 if i ^ m and yiPk then add kloC and continue in main loop 
end (of inner loop) 

17 if fc is maximal (<^> k P yi for no i) then add fc to C 

18 if fc is minimal (<^> j/i P /c for no i) then add k to ^4 
end (of main loop) 

19 for all (x,y) e Qabc U (n x (N\n)) callC(x,y) 

20 if the calls so far did not determine P uniquely 

21 then for all remaining pairs (x, y) call C(x,y) 

22 compute and print P. 



6 



